Speeding up tandem mass spectrometry based database searching by peptide and spectrum indexing.
نویسندگان
چکیده
Database searching is the technique of choice for shotgun proteomics, and to date much research effort has been spent on improving its effectiveness. However, database searching faces a serious challenge of efficiency, considering the large numbers of mass spectra and the ever fast increase in peptide databases resulting from genome translations, enzymatic digestions, and post-translational modifications. In this study, we conducted systematic research on speeding up database search engines for protein identification and illustrate the key points with the specific design of the pFind 2.1 search engine as a running example. Firstly, by constructing peptide indexes, pFind achieves a speedup of two to three compared with that without peptide indexes. Secondly, by constructing indexes for observed precursor and fragment ions, pFind achieves another speedup of two. As a result, pFind compares very favorably with predominant search engines such as Mascot, SEQUEST and X!Tandem.
منابع مشابه
An efficient algorithm for the blocked pattern matching problem
MOTIVATION Tandem mass spectrometry (MS) has become the method of choice for protein identification and quantification. In the era of big data biology, tandem mass spectra are often searched against huge protein databases generated from genomes or RNA-Seq data for peptide identification. However, most existing tools for MS-based peptide identification compare a tandem mass spectrum against all ...
متن کاملSpeeding up tandem mass spectrometry database search: metric embeddings and fast near neighbor search
MOTIVATION Due to the recent advances in technology of mass spectrometry, there has been an exponential increase in the amount of data being generated in the past few years. Database searches have not been able to keep with this data explosion. Thus, speeding up the data searches becomes increasingly important in mass-spectrometry-based applications. Traditional database search methods use one-...
متن کاملSpeeding up Scoring Module of Mass Spectrometry Based Protein Identification by GPUs
Database searching is a main method for protein identification in shotgun proteomics, and till now most research effort is dedicated to improve its effectiveness. However, the efficiency of database searching is facing a serious challenge, due to the ever fast increasing of protein and peptide databases resulting from genome translations, enzymatic digestions, and post-translational modificatio...
متن کاملSponsored by
Tandem mass spectrometry is a widely used method for protein and peptide sequences identification. Since the mass spectra contain up to 80% of noise and many other inaccuracies, there still exists a need for more accurate algorithms for mass spectra interpretation. The sizes of protein databases grow rapidly and the methods for indexing these databases in order to interpret mass spectra become ...
متن کاملProbability-based pattern recognition and statistical framework for randomization: modeling tandem mass spectrum/peptide sequence false match frequencies
MOTIVATION In proteomics, reverse database searching is used to control the false match frequency for tandem mass spectrum/peptide sequence matches, but reversal creates sequences devoid of patterns that usually challenge database-search software. RESULTS We designed an unsupervised pattern recognition algorithm for detecting patterns with various lengths from large sequence datasets. The pat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Rapid communications in mass spectrometry : RCM
دوره 24 6 شماره
صفحات -
تاریخ انتشار 2010